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IMPROVED ARCHITECTURE FOR GENERATING INTERMED I ATE 
REPRESENTATIONS FDR PROGRAM CODE CONVERSION 



The subject invention relates generally to the field 
of computers and computer software and, more particularly , 
to program code conversion methods and apparatus useful , 
for example, in code translators, emulators and' 
accelerators - 



Across the embedded and non-embedded CPU market, one 
finds predominant Instruction Set Architectures (ISAs) for 
which large bodies of software exist that could be 
"Accelerated" for perf ormaince, or "Translated" to a myriad 
is of capable processors that could present better, 
cost /per formance benefits, provided that they could, 
transparently access the relevant software. One also 
finds dominant CPU architectures that are locked in time 
to their ISA, and cannot evolve in . performance or market 

2 0 reach and would benefit from * Synthetic CPU" co- 

architecture . 

It is often desired to run program code written for a 
computer processor of a firat type (a "subject" processor) 
25 on a processor of a second type {a 'target* processor) . 
Here, an emulator or translator is used to perform program 
code translation, such that the subject program is able to 
run on the target processor, tfhe emulator provides a 
virtual environment, as if the subject program were 

3 0 running natively on a subject processor, by emulating the 

subject processor. 
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In the past, subject code is converted to an 
intermediate representation of a computer program during 
run-time translation using so-called base nodes, as 
described for example in WP 99/03168 entitled Program Code 

5 Conversion, in connection with Figures 1 through S of this 
application. intermediate representation is a term widely 
used in the computer industry to refer to forms of 
abstract computer language in which a program may be 
expressed, but which is not specific to, and is not 

10 intended to be directly executed on, any particular 
processor. Program code conversion methods and apparatus 
which facilitate such acceleration, translation and co- 
architecture capabilities utilising intermediate 
representations are, for example, addressed in the 

IS above-mentioned WO 99/03158. 

According to the present invention there is provided 
an apparatus and method as set forth in the appended 
claims. Preferred features of the invention will be 
20 apparent from the dependent claims, and the description 
which follows. 

The following is a nummary of various aspects and 
advantages" realizable according to various embodiments of 

25 the improved architecture for program code conversion 
according to the present invention. It is provided as an 
introduction to assist those skilled in the art to more 
rapidly assimilate the detailed discussion of the 
invention that ensues and does not and is not intended in 

3 0 any way to limit the scope of the claims that are appended 
hereto . 



[^/■?42;ig2^ay-;03;;!;;04::28;:l 



02-MAY-2003 IS: 06 FROM PPPLEYflRD LEES 



TO 01633814444 



P. 07 



The various embodiments described below relate to 
improved architectures for a program code conversion 
apparatus and an associated method for converting subject 
code executable in a subject computing environment to 
5 target code executable in a target computing environment. 
The program code. conversion apparatus creates an 
intermediate representation (IR) of the subject code which 
may then be optimized for the target computing environment 
in order to more efficiently generate the target code. 
10 Depending upon the particular architectures of the subject 
and target computing environments involved in the 
conversion, the program code conversion apparatus of one 
embodiment determines which of the following types Of IR 
nodes to generate in the intermediate representation (IR) : 
base nodes, complex nodes, polymorphic nodes, and 
architecture-specific nodes. The program code conversion 
architecture will by default generate base nodes when 
creating the intermediate representation, unless it is 
determined that another one of the types of nodes would 
more applicable to the particular conversion being 
effectuated. 



IS 



20 



Base nodes provide a minimal set of nodes (i.e., 
abstract expressions) needed to represent the semantics of 

25 any subject architecture running the subject code, such 
that base nodes provide a RISC-like functionality. 
Complex nodes are generic nodes that represent CISC-like 
semantics of a subject architecture running the subject 
code in a more compact representation than base nodes* 

30 While all complex nodes could be decomposed into base node 
representations with the same semantics, complex nodes 
preserve the semantics of complex instructions in a single 
IR node in order to improve the performance of the 
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translator. Complex nodes essentially augment the set of 
base nodes for CISC-like instructions in the subject code. 
Base nodes and complex nodes are both generally used 
over a wide range of possible subject and target 
5 architectures, thus allowing generic optimizations to be 
performed on the corresponding IK trees comprxsed of base 
nodes and complex nodes. 

The program code conversion apparatus utilizes 
io polymorphic nodes in the intermediate representation when 
the features of the target computing environment would 
cause the semantics of the particular subject instruction 
to be lost if realized as a generic IR node. The 
polymorphic nodes contain a function pointer to a function 
15 of che target computing environment specific to a 
particular subject instruction in the source code. The 
program code conversion apparatus further utilizes 
architecture-specific nodes to provide target -specialized 
conversion components for performing specialized code 
20 generation functions for certain target computing 
environments . 

The improved IR generation methods hereafter described 
allow the program code conversion apparatus to be 
25 configurable to any subject and target processor 
architecture pairing while maintaining an optimal level of 
performance and maximizing the speed of translation. 

For a better understanding of the invention, and to 
30 show how embodiments of the same may be carried into 
effect, reference will now be made, by way of example, to 
the accompanying diagrammatic drawings in which: 
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Figure 1 shows an example computing environment 
including subject and target computing environments; 

5 Figure 2 shows a preferred program code conversion 

apparatus ; 

Figure 3 is a schematic diagram of an illustrative 
computing environment illustrating translation of subject 
ID code to target code; 

Figure 4 is a schematic illustration of various 
intermediate representations realized by a program code 
conversion apparatus in accordance with a preferred 
15 embodiment of the present invent ion ; 

Figure 5 is a detailed schematic diagram of a 
preferred program code conversion apparatus; 

20 Figure 6 shows example IR trees generated using base 

nodes and complex nodes; 

Figure 7 is a schematic diagram illustrating an 
example of ASN generation for implementation of the 
25 present invention in an accelerator? 

Figure & is a schematic diagram illustrating an 
example of ASN generation for implementation of the 
present invention in a translator; 

30 

Figure 9 is an operational flow diagram of the 
translation process when utilizing ASNs in accordance with 
a preferred etft^f^l^e^^^the preseM^tion ; 
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Figure 10 is a schematic diagram illustrating an 
example of a translation process and corresponding IR 
generated during the process; 

5 

Figure 11 is a schematic diagram illustrating another 
example of a translation process and corresponding IR 
generated during the process; and 

10 Figure 12 is a schematic diagram illustrating a 

further example of a translation process and corresponding 
IR generated during the process. 

The following description is provided to enable any 
IS person skilled in the art to make and use the invention 
and sets forth the best modes contemplated by the 
inventors of carrying out their invention. Various modifi- 
cations, however, will remain readily apparent to those 
skilled in the art, since the general principles of the 
20 present invention have been defined herein specifically to 
provide an improved architecture for a program code 
conversion apparatus/ 

Referring to Figure 1, an example computing 
25 environment is shown including a subject computing 
environment 1 and a target computing environment 2. In 
the subject environment 1, subject code 10 is executable 
natively on a subject processor 12. The subject processor 
12 includes a set of subject registers 14. Here, the 
30 subject, code 10 may be represented in any suitable 
language with intermediate layers {e.g., compilers) 
between the subject code 10 and the subject processor 12, 
as will be familiar to a person skilled in the art. 
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It is desired to run the subject code 10 in the target 
computing environment 2, which provides a target processor 
22 using a set of target registers 24, These two 
5 processors 12 and 22 may be inherently non-compatible , 
such that these two processors use different instruction 
sets- Hence, a program code conversion architecture 30 . 
is provided in the target computing environment 2, in 
order to run the subject code 10 in that non-compatible 

10 environment. The program code conversion architecture 3 0 
may comprise a translator , emulator, accelerator, or any- 
other architecture suitable for converting program code 
designed for one processor type to program code executable 
on another processor type. \ For the purposes of the 

15 discussion of the present invention following hereafter, 
the program- code conversion architecture 3 0 will be 
referred to as the "translator 30." It should be noted 
that the two processors 12 and 22 may also be of the same 
architecture type, such as in the case of an accelerator. 

20 The translator 3 0 performs a translation process on the 
subject code 10 and provides a translated target code 20 
for execution by the target processor 22. Suitably, the 
translator 30 performs binary translation, wherein subject 
code 10 in the form of executable binary code appropriate 

25 to the subject processor 12 is translated into executable 
binary code appropriate feo ' the target processor 22 . 
Translation can be performed statically or dynamically. 
In static translation/ an entire program is translated 
prior to execution of the translated program on the target 

30 processor. This involves a significant delay. Therefore, ■ 
the translator 3 0 preferably dynamically translates small 
sections of the subject code 10 for execution immediately 
on the target ^3?d8?@tf6B'r 6 ? E 22 - This is much more efficient, 




[0067942 >: 02- May- 03 0^:i28i| 



02-MPY— 2003 16^08 FROM BPPLErflRD LEES TO B1033814444 -12 



15 



because lar^e sections of the object code xo may not be 
used in practice or may be used only rarely. 

Referring now to Figure 2, a preferred embodiment of 
5 the tran S lator 30 is illustrated in more detail, 
comprising a front end 31, a kernel 32 and a back end 33. 
The front and 31 is " configured specific, to the subject 
processor 12 associated with the subject code. The front 
end 31 takes a predetermined section of the subject code 
xo 10 and provides a block of a generic intermediate 
representation (an *XR block") . The .kernel 32 optimizes 
each IR block generated by the front end 31 by employing 
optimization techniques, as readily known to those silled 
in the art. The back, end 33 takes optimized IR blocks 
from the kernel 32 and produces target code 20 executable 
by the target processor 22 . 

Suitably, the front end 31 divides the subject code 10 
into basic blocks, where each basic block is a sequential 
set of instructions between a first, instruction at a 
unique entry point and a last instruction at a unique exxt 
point (such as a jump, call or branch instruction). The 
kernel 32 may select a group block comprising two or more 
basic blocks which are to be treated together as a single 
unit. Further, the front end 31 may form iso-blocks 
representing the same' basic block of subject code under 
different entry conditions. In use, a first predetermined 
section of the subject code 10 is identified, such as a 
basic block, and is translated by the translator 30 
running on the target processor 22 in a translation mode. 
The target processor 22 then executes the corresponding 
optimized and translated block of target code 20. 



25 
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The translator 3 0 includes a plurality of abstract 
registers 34, suitably provided in the kernel 32, which 
represent the phyaical subject register© 14 that would he 
used within the subject processor 12 to execute the 
5 subject code 10. The abstract registers 34 define the 
state of th.e subject processor 12 . being emulated by 
representing the expected effects of the subject code 
instructions on the subject processor registers . 

10 A structure employing such an implementation is shown 

in Figure 3. As shown, compiled native subject code is 
shown residing in an appropriate computer memory storage 
medium 100, the particular and alternative memory storage 
mechanisms being well-known to those skilled in the art. 

15 The software components 'include- native - subject code to be 
translated, translator code, translated code, and an 
operating system. The translator code, i.e., the compiled 
version of the source code implementing the translator, is 
similarly resident on an appropriate computer memory 

20 storage medium 102 « The translator runs in conjunction 
with the memory- stored operating system 104 such as, for 
example, UNIX running on the target processor 10 6, 
typically a microprocessor or other suitable computer. It 
will be appreciated that the structure illustrated in 

25 Figure 3 is exemplary only and that, for example, methods 
and processes according to the invention mcty be 
implemented in code residing with or beneath an operating 
system. The translated code is shown residing in an 
appropriate computer memory storage medium 10 8. The 

30 subject code, translator code, operating system, 
translated code and storage mechanisms may be any of a 
wide variety of types, as known to those skilled in the 
art . 
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In a preferred embodiment of the present invention , 
program code conversion is performed dynamically, at rim- 
time, while the translated program is running in the 
5 target computing environment. The translator 3 0 runs 
inline with the translated program. The execution path of 
the translated program is a control loop comprising the 
steps of: executing translator code which translates a 
block of the subject code into translated code, and then 

10 executing that block of translated code; the end of each 
block of translated code contains instructions to return 
control back to the translator code. In other words, the 
steps of translating and then executing the subject code 
are interlaced, such that only portions of the subject 

15 program are translated at a time. 

The translator 30' s fundamental unit of translation is 
the basic block, meaning that the translator 3 0 translates 
the subject code one basic block at a time. A basic block 
20 is formally defined as a section of code with exactly one 
entry point and exactly one exit point, which limits the 
block code to a single control path- For this reason, 
basic blocks are the fundamental unit of control flow. 

25 Intermediate Representation (IR) Trees 

In the process of generating translated code, 
intermediate representation ( V% IR") trees are generated 
based on the subject instruction sequence- IR trees 
30 comprise nodes that are abstract representations of the 
expressions calculated and operations performed, by the 
subject program. The translated code is then generated 
based on the IR trefea^ ; ' ' The colleic"^Sr^^f^iR nodes 
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described herein are colloquially referred to as "trees" . 
We note that, formally, such structures^ are in fact 
directed acyclic graphs (DAGs) , not trees. The formal 
definition of a tree requires that each node have at most 

5 one parent . Because the embodiments described use common 
subexpression elimination during IR generation, nodes will 
often have multiple parents. For example, the IR of a 
flag-affecting instruction result may be referred to by 
two abstract registers, those corresponding to the 

10 destination subject register and the flag result 
parameter. 

For example, the subject instruction ( add %rl, %r2, 
%r3) performs the addition of the contents of subject 

15 registers %r2 and %r3 and stores the result in subject 
register %rl. Thus, this instruction corresponds to the 
abstract expression **rl ■ %r2 + %r3". This example 
contains a definition of the abstract register %rl with an 
add expression containing two subexpressions representing 

20 the instruction operands %rl and %r2 . In the context of a 
subject program, these subexpressions, may correspond to 
other, prior subj ect instructions , or they may represent 
details of the current instruction such as immediate 
constant values . 

25 

When the ^add" instruction is parsed, a new w +" IR 
node is generated, corresponding to the abstract 
mathematical operator for addition. The IR node 

stores references to other IR nodes that represent the 
30 operands {held,, in subject registers, represented as 
subexpression trees) . The node is itself referenced 

by the appropriate subject register definition (the 
abstract register for %rl r the instruction's destination 
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register) . As those filled in the art may appreciate, m 
one embodiment the translator is implemented using *n 
object-oriented programming language such as C++. For 
example, an IR node is implemented as a C++ object, and 
, references to other nodes are implemented as C++ 
references to the C++ objects corresponding to those oth e r 
nodes. An IR tree is therefore implemented as a 
collection of IR node objects, containing various 
references to each other. 

0 . , 

Abstract Registers 

Further, in the embodiment under discussion, IR 
generation uses a set of abstract registers 34. These 

.5 abstract registers 34 correspond to specific features of 
the subject architecture. For example, there is a unique 
abstract register 34 for each physical register 14 on the 
subject architecture 12. Abstract registers 34 serve as 
placeholders for IR trees during IR generation. For 

20 example, the value of subject register %r2 at a given 
point in the subject instruction sequence is represented 
by a particular IR expression tree, which is associated 
with the abstract register 34 for subject register %r2 . 
Xr. one embodiment, an abstract register 34 is implemented 

25. as a C++ object, which is associated with a particular IR 
■ tree via a C++ reference to the root node object of that 



tree . 



In the example instruction sequence described above, 
30 the translator 30 has already generated IR trees 
corresponding to the values of %r2 and %r3 while parsing 
the subject instructions that precede the "add" 
instruction. In other words, the subexpressions that 
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calculate the values of %r2 and %r3 are already 
represented as IR trees. When generating the IR tree for 
the *add %rl, %r2, %r3" instruction, the new * + " node 
contains references to the IR subtrees for %r2 and %r3 . 

5 

The implementation of the abstract registers 34 is 
divided between components in both the translator 3 0 and 
the translated code. In the context of the translator, an 
abstract regiater is a placeholder used in the course of 

10 IR generation, such that the abstract register 34 is 
associated with the IR tree that calculates the value of 
the subject register 14 to which a particular abstract 
register 34 corresponds. As such, abstract registers 34 
in. the translator may be implemented as a C++ object which 

15 contains a reference to an IR node object (i.e., an IR 
tree) . In the context of the translated code, an abstract 
register 34 is a specific location within the abstract 
register store, to and from which subject register 14 
values are synchronized with the actual target registers 

20 24. Alternatively, when a value has been loaded from the 
abstract register store, an abstract register 34 in the 
translated code could be understood to be the target 
register 26 which temporarily holds a subject register 
value during the execution of the translated code, prior 

25 to being saved back to the register store. 

An exarrrple of program translation as described is 
illustrated in Fig. 4. Figure 4 shows the translation of 
two basic block of x86 instructions, and the corresponding 
30 IR trees that are generated in the process of translation. 
The left side of Figure 4 shows the execution path of the 
emulator during translation. The translator 3 0 translates 
151 a first basic block of subject code ±S3«W4«Sb0^target 
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code and then executes 155 that target code. When the 
tar-get code finishes execution, control is returned to the 
emulator 157. The translator 30 then translates 157 the 
next basic block of subject code 159 into target code and 
executes 161 that target code, and so on. 

In the course of translating 151 the first basic block 
of subject code 153 into target code, the translator 30 
generates an IR tree 163 based on that basic block. In 
this case, the IR tree 163 is generated from the source 
instruction "addfcecx, %edx," which is a flag-affecting 
instruction. In the course of generating the IR tree 163, 
four abstract registers are defined by this instruction: 
the destination subject register fcecx 167. the first flag- 
15 affecting instruction parameter 1S9, the second flag- 
affecting instruction parameter 171, and the flag- 
affecting instruction result 173. The IR tree 
corresponding to the "add" instruction is simple a 
(arithmetic addition) operator 175, whose operands are the 
subject registers %ecx 177 and %edx 179, 



10 



20 



Emulation of the first basic block puts the flags in a 
pending state by storing the parameters and result of the 
flag-affecting instruction. The flag-affecting 

25 instruction is "add %ecx, %edx. • The parameters of the 
instruction are the current values of emulated subject 
registers %ecx 177 and %edx 179. The symbol preceding 

the subject register uses 177, 179 indicate that the 
values of the subject registers are retrieved from the 

30 global register store, from the locations corresponding to 
%ecx and %edx, respectively, as these particular subject 
registers were not previously loaded by the current basic 
block. These parameter values are then stored in the 
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first; 169 and second 171 flag parameter abstract 
registers. The result of the addition operation 175 is 
stored in the flag result abstract register 173 , 

After the IR tree is generated, the corresponding 
target code is generated based on the IR, The process of 
generating target code from a generic IR is well 
understood ±n the art. Target code is inserted at the end 
of the translated block to save the abstract registers, 
10 including those for the flag result 173 and the flag 
parameters 169, 171, to the global register store • After 
the target code is generated, it is then executed 155. 

In the course of translating 157 the second basic 

15 block of subject code 159, the translator 30 generates an 
IR tree 165 based on that basic block. The IR tree 165 is 
generated from the source instruction "pushf," which is a 
flag-using instruction. The semantics of the xv pushf" 
instruction are to store the values of all condition flags 

20 onto the stack, which requires that each flag be 
explicitly calculated. As such, the abstract registers 
corresponding to four condition flag values are defined 
during IR generation: the zero flag ( W ZF W ) 181, the sign 
flag 183, the carry flag ("CF") 185, and the 

25 overflow flag ("OF") 187. Node 195 is the arithmetic 
comparison operator "unsigned less™ than. * The calculation 
of the condition flags is based on information from the 
prior flag-affecting instruction,, which in this case is 
the "add %ecx, %edx" instruction from the first basic 

30 block 153. The IR calculating the condition flag values 
165 is based on the result 183 and parameters 191, 193 of 
the f lag-af f eccing instruction. As above, the w @" symbol 
preceding the flag parameter labels indicates^^hCt^- the 
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. , araet code to load chose values from the 

emulator inserts target coae 

global register store prior to their use. 

^ the second basic bloc, forces the flag value, to 
^ t R Pr the flag values are calculated and 
5 be normalized. After the Eiag » DUS hf- 

i-=-r.««f code emulating the pusm 
iicpd (bv the target coae 

«tion> . they will be stored into the globe! register 
atot .. Simlt ^, the, pending fiag afreet 

refers (parameters and result, — *■* ~ 
„ JLi-* -ate to reflect the fact that the flag values 

«. Btm . explicitly II.-., «!» flags have been 

normalized) - 

Figure 5 3 Wb the translator 30 formed in accordance 
„ „ith a preferred embodiment of the present indention 
capable of generating several different types of X* nodes 
that w ha used in translation a, wall aa illu.trat.ne 
h ow the implementations of those different types of « 
nodes are distributed between the fronted 31, kernel 12. 
2 „ end bsehend 33 components of the trenalator 30. The term 
-realise" refers to IB generation, whioh is performed xn 
tb. frontend 31 as subject instructions of the subject 
code 10 are decoded (i.e., parsed). The term -plant 
refers to tercet coda generation, which is performed in 
25 the backend 3 3, 

Note that while the translation process is described 
below in terms of a single subject instruction, these 
operations actually take place for an entire basic bloc* 
30 of subject instructions at once as described above. In 
other words, the entire basic block ia initially decoded 
to generate an XR forest, then the kernel ^ applies 
optimizations to the whole IR forest. Lastly, the backend 



• 
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33 performs target code generation for the optimized IR 
forest one node at a time. 

When generating an IR forest for a basic block, the 
5 translator 3 0 may generate either base nodes, complex 
nodes, polymorphic nodes, or architecture specific nodes 
{ASM), or , any combination thereof, depending upon the 
desired translator performance and the particular 
architectures of the source processor and target processor 
io pairing. 

Base Nodes 

Base nodes are abstract representations of the 
15 semantics (i.e., the expressions , calculations, and 
operations) of any subject architecture and provide the 
minimal set of standard or basic nodes needed to represent 
the semantics of the. subject architecture. As such, base 
nodes provide simple Reduced Instruction Set Computer 
20 (RISC) -like functionality, such as, for instance, an "add" 
operation. In contrast to other types of nodes, each base 
node is irreducible, meaning that it cannot be broken down 
any further into other IR nodes. Due to their simplicity, 
base nodes are also easily translated by the translator 30 
25 into target instructions on all backends 33 (i.e., target 
architectures) . ' 

When utilizing only base IR nodes, the translation 
process takes place entirely at the top portion of Figure 
30 5 (i.e., paths traveling through the "Base IR" block 204). 
The front-end 31 decodes a subject instruction from the 
subject program code 10 in decode block 200, and realizes 
(generates) in realize block 202 a corresponding IR tree 
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of base nodes. The IK «- i- ^ \ 
fro nt-end 3, to the Base » block 204 in ^ 3 J the 
optimizations a, applied to a, entire « forest - ^ 
IK forest optimized by the Base « block 204 consists only 

„ * it is entirely generic to any processor 
5 of base nodes, it is entirexy 3 

. * Ttt forest is then passed from 
architecture. The optimized Ift forest xs P 

tbe Ba Se IR block 204 in the kernel 32 to the backend 33, 
which Plants (generates) corresponding target code 
instructions for each IR node in Plant block 206- The 
10 target code instructions are then encoded by encode block 
208 for execution by the target processor- 

As noted above, base nodes are easily translated into 
target instructions on all backhands 33, and the 
15 translated code can typically be generated entirely 
through exclusive utilization of base nodes. while the 
exclusive use o£ base nodes is very quick to implement for 
the translator 30, it yields suboptimai performance in the 
translated code. in order to increase the performance of 
20 the translated code, the translator 30 can be specialized 
to exploit features of the target processor architecture 
by using alternative types of IR nodes, such as complex 
nodes, polymorphic nodes, and architecture-specific nodes 

(ASNs) V 



25 



Complex Nodes 



30 



complex nodes are generic nodes that represent the 
semantics of a subject architecture in a more compact 
representation than base nodes. Complex nodes provide a 
"Complex instruction Set Computer (CISC) -like* 

functionality such as -.dd_i«m" tad* register and 
immediate constant), for example. Specifically, complex 
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nodes typically represent instructions with immediate 
constant fields. Immediate -type instructions are 

instructions in which a constant operand value is encoded 
into the instruction itself in an "immediate" field. For 
5 constant values that are small enough to fit into 
immediate fields, such instructions avoid the use of one 
register to hold the constant. For complex instructions, 
complex nodes can represent the semantics of the complex 
instructions with much fewer nodes than equivalent base- 
10 node representations characterizing the same semantics . 
While complex nodes can essentially be decomposed into 
base node representations having the same semantics, 
complex nodes are useful in preserving the semantics of 
immediate-type instructions in a single IR node, thus 
15 improving the performance of the translator 30. 
Furthermore, in some situations, the semantics of the 
complex instructions would be , lost by representing the 
complex instructions in terms of base nodes, and complex 
nodes thus essentially augment the base node set to 
20 include IR nodes for such "CISC-like" instructions. 

With reference to Figure 6, an example of the 
efficiency achieved by using a complex node as compared to 
that of base nodes will now be described. For example, 
25 the semantics of the MIPS add- immediate instruction "addi 
rl,#10" adds ten to the value held in register rl . Rather 
than loading the constant value (10) into a register and 
then adding two registers, the addi instruction simply 
encodes the constant value 10 directly into the 
30 instruction field itself, thus avoiding the need to use a 
second register. When generating an intermediate 
representation of these semantics strictly using base 
^ - !r ^ t- : ^, nodes, the base node representation for this instruction 
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,,=1,1= to from the const ^#10> 
would first load the constant value 10 trom 

, fl , pr nod6 r(x ) 61, and then perform an 
node 60 into a register node rix; 

addition of the register node rl 62 and register node rU) 
t using add node S3. *he complex node represent^ 
. consists of a single "add to Mediate- XH node ^0 
contains the constant value 10 at portion ,2 of the node 
70 and a reference to roister rl 74.. m the base node 
scenario, the backend 33 would need to perform ^ 
recognition capable of recognizing four-nod. pattern 
l0 shown in Figure 6, in order to ..cognize and generate an 
»edd to immediate" target instruction . I* the absence of 
idiom recognition, the bacfcend 33 would emit an extra 
instruction to load the constant value x0 into a register 
prior to performing a register-register additxon. 

• complex nodes reduce the need for idiom recognition in 
t he backend 33, because complex nodes contain more 
S emantic information than their base node equivalents, 
specifically, complex nodes avoid the need for backend 33 
2 o idiom recognition of constant operands. By comparison, xf 
^ ^mediate -type subject instruction were decomposed into 
base nodes (and the target architecture also contained 
mediate-type instructions) . then the translator 30 would 
either need expensive backend 33 idiom recognition to 
25 identify the multiple-node cluster as an immedxate 
instruction candidate, or generate inefficient target code 
{i e more instructions than necessary, using more target 
registers than necessary. In other words, by utilizing 
base nodes alone, performance is lost either in the 
30 translator 30 (through idiom recognition) or the 
translated code (through extra generated code without 
idiom recognition) . More generally, because complex nodes 
are a 'mbre compact representation of semantic information, 
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e til nodes that the translator 30 
they reduce the number of IR nodes tna 

must create, traverse, and delete. 

X^ediate-tvp* instruetione are ^ »any 

s architectures. Therefore. nodes are 

that they are reusable across a range o «<^™; 
However, not every convex node is present in the IR node 
set of every translator. Certain generic features of the 
translator are configurable. meaning that when a 
10 translator is being coaled for a particular P a,r of 
source and target architectures, featuree that do not 
spply to that translator configuration can be excluded 
apply to „ amD ls in a MIPS-MIPS (MIPS to 

from compilation. For example, m 

«P» translator, complex nodes that do not match the 
15 semantics of any «»S instructions are excluded from the 
I R node set because they would never be utilized. 

complex nodes can further improve the performance of 
the target code generated using an in-order "-ersal^ 

20 in-order traversal is one of several alternative IR 
traversal slgorithms that determines the order in wbicblR 
n odes within an IR tree are generated into target code, 
specifically, in-order traversal generates each IR node as 
it is first traversed, which precludes baCcend 33 idiom 

25 recognition due to the absence of a separate optimization 
p^s over the entire IR tree. Complex nodes represent 
more semantic information per node than base nodes, and 
thus some of the work of idiom recognition Is implicit 
within the complex nodes themselves. This allows the 

3 o translator 30 to use in-order traversal without suffering 
much of a penalty in target code performance as it would 
with base nodes alone. 
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^ the. translator 3° generates -*« 

translation P— described ^° instruct ions tnat 

^ only difference is tn ^ O _ ^ 

^rch the semantics of a ^ ^ ^ 

nodes in Realise block 202 ratner 
complex nodes ,„ ft . d i in e separating Realize 

<aa illustrated by the dotted line v 
lae .till generic across a wice 

block 202) . nodes are .tall g 

„u-^v, pnables the xexnei 

the eat ire 



r>± arcMtecturea, which, enables 
10 range of arc e ir foMt 



•»■ _ entire IK io-.^ 1 -' 

the base node equivalents, 
polymorphic Nodes 

A preferred embodiment of the translator 30 « 

r ^P- w<-.« efficiently utilize 

provide specialized code generation to efficient y 
provioe v ,„„._.. for specific, performance- 

caroet architecture features ror »f 

target c nolvroorphic mechanism 

critical subject instructions. The polymo p 

., . generic polymorphic node which 

25 i. implead as a generic P ym ^ 

contains a function pointer to a „„„ ialiM a 
generation function. Each function pointer is 
L a particular subject instruction. This polymorphic 
danism preempts the standard frontend 31 » 
30 mechanism, which would otherwise decode 
instruction into base or ocmple* nodes. 
polymorphic mechanism, the generation of those base nodes 
IS. in-the backend 33. either reeult in subcptimal 
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target code or retire expensive idiom recognition to 
reconstruct the semantics of the subject instruction. 

each polymorphic function is specific to a particular 

a ^ faraet architecture function 
5 subject instruction and target x . ■ 

pairing. Polymorphic nodes effectively -/pass the kernel 
32 because they do not contain any generic semantics for 
the kernel 32 to optimize. Polymorphic nodes do net 
participate in frontend 31 optimizations because, after 
1Q mapping the subject instruction to the corresponding 
backend 33 function pointer, they do not retain any 
semantic information of the subject instruction. Once the 
polymorphic node is created, all frontend 31 semantic 
information is lost. Thus, the kernel 32 cannot see the 
15 semantics of the target instructions to be planted, and 
therefore the polymorphic nodes are always left out of the 
kernel 32 optimisations. Polymorphic nodes are therefore 
implemented only for a small subset of subject 
instructions, ones that occur frequently (i.e., are 
20 critical to performance) and for which generic code 
generation works poorly but specialized code generation 
works well. In other words, polymorphic nodes are 
typically used only for scenarios where target processor 
architectural features dictate that a particular subject 
25 instruction can be generated efficiently as target code, 
but if the subject instruction is decoded into base nodes 
the semantics necessary to exploit that efficiency- are 
lost. Otherwise, subject instruction decoding uses the 
default nodes (base nodes or complex nodes) to maintain 
30 platform-independence and eligibility for generic 
optimizations. 
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1a ^ r 30 to utilize polymorphic J 
In order for the -7^° tliroug , polymorphic X, 
nodes (i.e., tta F ^ providee a liet of 

— 212 ^ P t:l rg t f — pointer pa- « the 

contend 31. 3 polym orphic nodes containing 

provided list are realized as p y oir , ter Subject 

backend 3 3 function poster, 
the corresponding bacKen are realised as 

..^ nre not on the nsr oic 

ln5tWCtl ™L^ 1« „ disced 
, the pate reflect o£ subjec t instructa-on- 

targe t function poin-r P*l» ^ » 
« alize Hoc* « at the fronted JX- ^ 
31 performs raalization in the real Brocea s 
5 of ^3- in—i- ~~£?^ZZ* 33 

iB modified by information r.caxved from 

through path 214. 

TO >,io^k 212 of the kernel 32, 
in the polymorphic IE block- 

nodes do not participate m gen 
20 polymorphic nodes ^ fcheir 

optimisations, because the kern ective ly 
m other words, polymorphic noaes ei 

The kelel. This is illustrated in Fi*« 5 by 
oypa^s the kernel. r dashed 

showing the polymorphic IR blocK 

showing *• tar get function pointers 

which point to target <_u^<= » , - ^-p^—nt 

. and execU ted. this situation is different 

30 Mit \ rrrr rt: nod* 

polymorphic function is eroded directly 

«£f. ~ that «*« bl ° kend " Per£0r, " B * t the 
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poly-^c Plant bXocK »« i. contiguous «th both the 

'JZ. dentins nontriviel computation, are 
between the poiy-rphio « ««* 212 and ^ ^^xphio 
5 plant block 216) . 

Example X: Polymorphic IR Example 

To illustrate the process of optimizing the translator 
10 30 for utilizing polymorphic nodes in the IR, the 
following example describes the translation of . PPC 
• (PowerPC) instruction (left shift, 64-hit) 

retired in a PPC-P4 (PowerPC to Pentium4) translator 
using first base nodes and then polymorphic nodes. 

Without optimizing the translator ffc* the 
implementation of polymorphic nodes, the translation of 
the PPC SHL64 instruction would use only. base nodes; 

PPC-SHL64 => Base IR multiple nodes -> P4-multi P le 
instructions 

The frontend decoder 200 Of an unoptimized translator 
decodes the current block and encounters the PPC SHL64 
25 instruction. Next. the frontend realize block 2 02 

instructs the kernel 32 to construct an I* consisting of 
Tnultiple base nodes- Then the kernel 32 optimizes the IR 
forest (generated from the current block of instructions) 
and performs an ordering traversal to determine the order 
30 of code generation in Base IR block 204. Next, the kernel 
22 performs code generation for each IR node in order, 
instructing the backend 33 to plant appropriate RISC-type 
t .j^^^inallv. the backend 33 plants code in 
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M ,->. RTSOtvpe instruction 
^ n /r an d encodes . eacfr kj.^ uy^ 
planc woe ,0* »d axdtixmetm instructions in 

with one ox more target. 

encode block 208. 

s ^ optimized for a specific target £ 

epecialxsation of the frontend .11 and b,c*.nd 33 
performance-critical instructions. 

10 instructions 

Th e contend decode =00 of the optimised «*-™« 
30 ^ 8 the cur-* blocK and encounters the 
inetructlon. — t. -he fx— d realize Woe, 20, 

„ instruct, the Kernel 32 to construct en I> cons^trng of 

IlgXe polymorphic X* node. — — 3 Vf™ 
the » foreet for current bloc* end perform en ordering 
traversal to fi* the code generation order m the 
polymorphic I« Woe, ,12.- »ext, the fcernei » perforce 

20 lode generation for eech node, instructs the baefcend ^ 
„ Plant appropriate HisC-type inetructione . Ourin 
generation. however, polymorphic nodes 

differently than base nodes. Each polymorphic node causes 
th . invocation of a specialised code generator function 
25 which resides in the hacXend 33. The " 
.penalized code generator function plante code rn, plant 
M « k ,16 and encodes each subject architecture 
instruction with one or mora target architecture 
instructions in encode blocK =00. This code generation 
30 My involve register allocation for temporary regrsters. 



Example 2i Difficult Instructions 
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T he foxier example illustrates trans lation and 

optimization of ths PPC »P- instruction (move ""^ ™ 
control register to 64-bit general FPU regrster) ->»* 
°l w be performed by the 30 of the present 

invention. This subject instruction is coo complex to be 

represented by base nodes. 

in » unoptimized caoa, thi. instruction would be 
translated uein* a substitute function. Substitute 
functions are explicit translations for special cases of 
eubj.ct instructions that are particularly difficult to 
translate using the standard translation scheme 
substitute function translations are implemented as target 
code functions that perform the semantics of the sublet 
instruction. They incur a much higher execution cost than 
the standard IB instruction-based translation scheme. The 
^optimised translation schema for this instruction « 

, PPC MFFS instruction => Ease IR substitute function => 

P4 substitute function 

in a translator 30 using polymorphic IR, such, special 
case instructions are translated using a polymorphic node. 
5 The polymorphic node's function pointer provides a more 
efficient mechanism for the backend 33 to supply a custom 
translation of the difficult subject instruction. The 
optimized translation scheme for the same instruction is 



thus s 



30 



PPC MFFS instruction ~> single Polymorphic XR node -> 
P4 SSE2 instructions 
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Architecture-Specific Nodes 

^ referred embodiment of the translator 3 0 
in another prefer tranglator 3 Q may util±« 

present invention, the transia 
of the P reSen tASNB)f as sn0 wn in Figure 5, 

s architecture-spacifxc ^architectures (i.e., a 

_i f ^ to particular arcmt^^"* 
which are specific t0 P architecture 
• i ,. source architecture -target 

M , Ba =h architecture-speciric node («■> ^ 
COT *inatron> . Bach ar in^ct.on, thus 

.pacifically tailored J< -J archlt . CC ures . «hen 

UtillI ::: io n can " lamented -ich — — «- 

optimisations ccui ^ *- 

1,3. semantics -d can therefore operate on the ASHs. 

XR nods, may contain up to three components, a data 

15 IR noaes may ^. d a conversion 

_^ Btlt ^ implementation component, and 
component, an imp Holds any semantic 

The data component holds any 
component. The oa ^ p0de itself 

information which is not inherent m the ^ 
(e * the value of a constant immediate instruction 
The implementation component performs code 

20 f ield) ' * !! erefor e is specifically related to a 

aeration, and. therefore, P & 

navicular architecture. Tne cox 

S5 of the present invention m a translator, 

and ASH in the g enerated X, contain either a convert 
component or an implementation component, but not both. 

Each base node has an implementation component which 
,0 i. specific to the target architecture. Base nodes do not 
" have conversion counts, because base nodes encode the 
least possible ss»unt of semantic information in the :J* 
node hierarchy, ,tte ..=onvert,|^b^^odes into other 
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types of lit n=des would not provide any benefit. My «** 

o£ base codes into other types of IE nodes 
conversion or pase n^ CB , 

the re-collectiorx of semantic information 
vjould require the re coj.i=^ 

through idiom recognition. 

The im plementation component of an ASH is specify to 
th- node's archit.ot.re. such that it genera tes an 
architecture-specific instruction, corresponding to that 
ASH For example, the implementation component of a 

*. « - mtpS "id" (load) instruction. 
10 MIPSLoad ASK generates a MIPS 

When using th. translator of the present inv.nt.on wxth 
the same subject and target architectures (i.*.. 
accelerator), .ubjaet AS*s will possess implement at xon 
components. »hen utilising the translator with different 
aubj ect and target architectures, subject ASNs will have 
conver3ion components. ■". 

For example, Figure 7 illustrate, the AS* for a MIPS 
instruction when using an embodiment of the present 
invention in a WPS-BPS accelerator. The frontend 31 
decodes the MIPS »addi" (add immediate) instruction 701 
and generates an IR to include the corresponding ASN, 
MIPS AUDI .703. The subject and target architectures are 
the sa me for an accelerator, and thus the conversion 
component «CVT» 707 is undefined- The implementation 
component: -IMPI." 705 is defined to generate the same MIPS 
»addi" instruction 709, subject to register allocation 
differences in the code generation pass. 

Figure 8 illustrates the ASNs in the IR for the same 
mips instruction when using an embodiment of the present 
invention in a MXPS-X86 translator. The frontend 31 
decodes the MIPS "addi" subject instruction and generates 



20 
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• . ^ ASN MIPS ADDI SOI. The source 
a corresponding sublet " different for this 

A target: architectures are different: 

Xator and the in.pXen.nt.tion component B03 <* the 
T ^ L L 1. ^us undefined. The converse 
subject ASN 8UJ- x ^^alized conversion 

^nt 805 of the MIPS_ADDI.is a specialized c 
component 805 of ^ a 

which converts ttte au.uj = 
component, wnxen a ™ ric conversion 

. a^w 807 By comparison, a generic 

target ASN 807. »Y bage 
component would convert the sublet ASN 801 mt 
compone ^presentation of the 

,ode representation. The g ^ ^ ^ 

Wjm node JO . a - _ ^ ^ ^ ±- 

conversion component BXl t 
^ined. The i^ntation component *0 9 o« the taroet 
ASK 8 07 generates the a target instructs .13. - 
casa the X86 instruction "ADD SEAX, #10." 



" When the transfer - is otitis* 

instructions are realised as subdect-spedf « AS^ In 
™ « the fact that the frontend decode bloc* 300, the 
ITrealise bio* a». ana the s^ect ASK bloc* », are 
adjn j- f=nt that tne 

20 contiguous with each other ^^J^^,.^ 

^ .„ defined by - ^. one telactonBhiB 

is trivial, because there is a 

between subject instruction types -d sublet ASN. types 

25 whi eh understand the —tic. of, and can operas on 
sub3 ect ASNs . in other words, the subject code » 
initially realised as an IR forest consisting entirely of 
subject ASNs, to which subject-specific opti^atxons are 
then applied. 

30 B y default, a subject ASN has a generic conversion 

consent which generates an I> tree of base nodes. This 
allows support ^-me^^mr^ new subject- 

^lecture to ^ 
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tt3 rtftdes Subject ASNs 

m hinrk 222 and plant block 206 in 
through the ASN Base IR block 222 an P 
pigure 5f which are translated int. target ^ ^ 
Figure ^, described in detail 

5 simile manner to other base nodes as de*cnb 

, above . 

For subject instructions that are significant « 
th. correspond au^ct - ^eY™ 
„ specialised conversion cedents, which ' - 

or e«-= «■» — • — 3 consider » whether 
imp x..anc a special^ conversion covenant rncl^e 1) 
whether the t—= architectural £ — 
particularly erfioiant translation that would be mat n a 
15 Le node tranelation and (2) whether a sublet 
instruction occur* With such (regency that xt naa a 
3i9 ni £ icant i^act on pet £ o~. »— specal^ed 
conversion components are specific to the -«««^ 

Taroet ASNs (which by definition have 
architecture pair. Target Collide 
2Q the 6ame architecture as the target, -elude 

implementation components. 

wne, implementing the specialised conversion 
counts, the corresponding subject ASN nodes provide 
25 ^-specialized conversion components which conver the 
subj ect ASNs into target ASNs through the targe, ASN block 
224 The target ASN's implementation component is then 
invoked to perform code generation in the target AS* plant 
block 226. Each target ASN corresponds to one particular 
30 target instruction, such that the code generated from a 
target ASN is simply the corresponding target instruction 
that the ASN encodes. As such, code generation using 
target ASNs is computationally minima^^presenred in 
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p- tev the illustration of the target 
reflected in Figure 5 by the 

5« beinq contiguous with botn ue a 

. Iwn ^ th«e =o^one 0t s> . FU^™ «- » 
^avere,!. conversion. ^ =od. nation processes 
all controlled by the Kernel 32. 

Fig ure 9 liquates the translation process performed 
^ o» with a preferred embodiment of the 
10 in accordance with P utilizae the ASN 

translator of the present mention that uti 

. In the frontend 31, the translator decodes the 

mechanism. in tne iranw 

s^ect code .01 in step 903 into sublet AS»s .04. » 

per*o^ su bi .«-=Peci £ i= oration* » P . 
„ L - the X. tree ^ up of Each sublet 

£ 9 „ 1= the, converted in step ,0, into target- 

, «ji-n bv invoking the 

^kip IR nodes (target ASNs 9X1 ; oy 
compatible IR noo Subie ct ASM nodes 

subject ASN's conversion cogent. , are 

W hich have generic conversion, opponents by default are 

nnrtM Subject ASN nodes which 

20 converted into base nodes 909. bur> 3 

have specialised conversion components, as provided by the 
" bacend «. are converted into target ASNS . 911. ^ . 
conversion thus produces a mixed XR forest .13, containing 
bot h base nodes ,0, -and target ASNs. «. - the kernel 
25 Mj the translator perform generic optimizations *n step 
915 on the base nodes in- mixed IR forest 913. 
translator then perform, target -specif ic optimisations - 
step 916 on the target ASNs in the mixed IR forest 913 . 
Finally, code generation invokes the implementation 
30 component of .ach node ih the mixed tree (both base nodes 
and target ASN nodes have implementation components) m 
step 917, which then generates target code 919. 
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• i of a code accelerator, the aubject 

In the special case ot a coo 

and target architectures are both the same I» - 
scenario subject AS** persist throughout translation. In 
TeTron'tend 31. decoding generates subject — the 
. sublet instructions. m the kernel 33, the sub.ee ; «. 

v.^onni^-sDecific optimisations - 
are passed through architecture speciric h 

Code generation invokes the subject AS K s' implementation 
eOTfM »nt. to generate the correspond!-* instructions. As 
a Z. in a code accelerator the use of, ASKs prevents code 
V> explosion, by ensuring a minium subject-to-target 
instruction conversion ratio of 1.1. which can be 
increased by optimizations- 

The various embodiments of the translator of the 

can be configured for specific 
15 present invention can oe 

translator applications (i.e.. particular sublet 
architecture-target architecture pairs). As such, the 
translator of the present invention, is configurable to 

v desianed to run on any sub}ect 

convert subject code uesignea 

a , architecture to targat code executable on any target 
architecture. Across multiple translator applications, 
each base node has multiple implementation components, one 
£or each supported target architecture. The particular 
configuration being undertaken (i.e., conditional 
25 collation, determines which IR nodes and which 
components of those nodes to include in a particular 
translator application. 

The use of ASNs in a preferred embodiment of the 
3-0 Present invention provide, a plurality of" advantageous 
benefits. First, a translator product built from scratch 
can be developed quickly using generic IR implementations 
of subject instructions. Second, exi^ting^translator 



02-MRY-2003 16:19 FROM flPPLEYRRD LEES TO 01633814444 P. 38 



34 



products - ba u— augmented. * 

target-specific conversion components for subject 
inlLctfons that are criticel to- performance *»- 
beforehand or as empirically determined, . Third, as more 
s tranS u t o r pro*- are developed, the library <* »» 
nod.- (and implemented functionality) grows over t^, so 

future translator products can be implemented or optam^ad 

quickly . 

4- t-VtP- present invention backend 

10 Tiiis embodinient of the present- 

^mentations to Pic* and choose which subset 
instructions are worth optimizing .,- (by defining target- 
specialired conversion components). The generrc 

conversion component allows an ASN-based translator to be 
„ developed guicKly, while the specialised converexon 

components allows performance-critical instruction, to be 

selectively and incrementally optimised. 

Example 3, Difficult Instructions Using ASN 

20 

Returning to the PowerPC SHI.64 instruction of Sample 
2 above, the translator 30. using AStfi perform the 
following steps. The front end decoder 200 decodes .he 
currtnt block and encounters the PowerPC SHIs64 
25 instruction. The frontend 31 then realizes a single ASN 
for that instruction, SHL64-PPC-P4 . The kernel 32 then 
optimize* the IR for the current block of instructions and 
performs an ordering traversal of the IR in preparation 
for code generation. The kernel 32 then perform code 
30 generation for the ASN nodes by. invoking each particular 
ASN node's code generator function, which is an element of 
the implementation component.. The backend 33 then encodes 
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subject architecture (PPC) instructions into one or more 
target architecture (P4) instructions. 

MIPS Examples 

s 

Referring now to Figures 10. 11 and 12, examples 
illustrating the different IR trees that are generated 
from the same MIPS instruction sequence using base IR 
nodes, MIPS-MIPS ASN IR nodes, and MIPS-X86 ASN IR nodes, 
10 respectively, ars shown. The semantics of the example 
MIPS subject instruction sequence (load upper immediate, 
then bitwise-or immediate) is to load the 32 -bit constant 
value 0x12345678 into subject register M al* . 

15 In Figure 10, the Binary Decoder 300 is a frontend 31 

component of the translator 3 0 which decodes (parses) the 
subject code into individual subject instructions. After 
the subject instructions are decoded, they are realized as 
base nodes 302 and added to the working IR forest for the 
20 current block of instructions. The IR Manager 304 is the 
portion of the translator 30 that holds the working IR 
forest during IR generation. The IR Manager 304 consists 
of abstract registers and their associated IR trees (the 
roots of the IR forest are abstract registers) . For 

25 example, in Figure 10, the abstract register "al" 3 06 is 
the root of an IR tree 3 08 of five nodes, which is part of 
the current block's working IR forest. In a translator 30 
implemented in C++, the IR Manager 3 04 may be implemented 
as a C++ object that includes a set of abstract register 

30 objects (or references to IR node objects) - 

Figure 10 illustrates an IR tree 308 generated by a 
MIPS-to-X86 translator using base nodes«^ only . The 
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~SHL" {shift left) 



10 



^ un instruction 3,0 realizes a -SHI,- " 

Lewe ~ — - "« «* ™ h ; E ch ; h : 

thi s case are both — ^ """"" 
MI P S UH instruction 310 are to shift a constant value 
,0*1234) left by a extant number of ta« <K> ■ *» 

„ realizes an -OBI' (bitwise-or 

MIPS_0RI instruction 31= realizes 

i^ediate, base node 3Z0 with two operand nodes M . ~ 
352 the result, ot the SHI, node 314 and a constant value 
». semantics of the M lPS_ORI instruction 31= are 
perform a hitwise-or of the existing re g i=ter contents 
with a constant value (0x5678) . 

» an uncptimized code generator, the base nodes 
include no immediate-type operator, other than load 

.„Wh constant node results in the generation 
immediate, so each constant noo 

o£ a load immediate instruction. The unoptrmxzed base 

noQ s translator therefore retires five KXSC-type. 

operations (load, load, shift, load, or, for this sublet 
instructions sequence. BacHend 33 idiom recognition can 

reduce this number from five to two, by coalescing the 

constant nodes with their parent nodes. to generate 

i^ediate-type target instructions (i.e.. shift rmrnedxate 

. This reduces the number of target 

and or immediate) . mis r«<^^v-^ 

■ instruction, to two, hut for an increased translation cost 
s in performing the idiom recognition in the code generator. 
Using complex nodes in the IE can realize immediate -type 
IR nodes, which eliminates the need to perform idiom 
recognition in the baOcend 33 and reduces the translation 
cost of the code generator. Complex nodes preserve more 
a of the semantics of the original subject instructions, 
ana, with fewer IR nodes being realized, the translation 
cost of node generation is also reduced when J^ C ° mpleX 
nodes • 
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Figure 11. illustrates the IR tree generated by a MIPS- 
X86 (MIPS to XB6) translator using ASNs. After the 
subject instructions are . decoded by the binary decoder 
5 3 00, they are realized as MIPS_X86 ASN nodes 330, which 
are then added to the working IR forest for the current 
block. First, the MIPS_X86_LUI ASN node is converted into 
an X86 32-bit constant node 332 by the ASN's convert 
component. second, the MlPS_xae_ORI ASN node produces an 

10 X86 ORI node which is immediately folded with the previous 
XSS constant node (constant folding), resulting in a 
single XSS 32-bit constant node 334. This node 334 is 
encoded into a single XBS load constant instruction, "mov 
%eax, $0x12345678". As can be seen, ASN nodes result in 

15 fewer nodes than the base node example, thus reducing 
translation cost and providing better target code. 

Figure 12 illustrates an IR tree generated by a MIPS- 
MIPS translator (i.e.,. a MIPS accelerator) using ASNs- 

20 After the subject instructions 310, 312 are decoded by the 
binary decoder 300, they are realized as MIPS_MIPS ASN 
nodes 34 0, which are then added to the working IR forest 
for the current block. Because the source and target, 
architectures are the same for the MIPS -MIPS translator, 

25 the MIPS_MIPS_LUI and MIPSJMlPS_ORI ASN nodeB 340 have 
null (undefined) convert components. As such, there is a 
direct correspondence between the subject instructions and 
the final IR nodes used to generate code. This guarantees 
a 1:1 subject- to- target instruction translation ratio, 

3d even before any optimizations are applied. In other 
words, ASN nodes eliminate code explosion for same-same 
translators (accelerators) . ASN nodes also allow 16-bit 
constant nodes to be shared, which is useful for efficient 



0067943 02 May 03 04:28 [ 



02-MRY-2003 16=20 FROM flPPLEYRRD LEES TO 01633814444 P-42 

0 



38 



10 



15 



20 



25 



translation of contiguous memory accesses, on the MIPS 
platform* 

Basic blocks of instructions are translated one 
aul} j e ct inaction at: a time- Each subject instruction 
results in the formation o£ (realizes) an IR tree. After 
the IR tree for a given instruction is created, it is then 
integrated into the working IR forest for the current 
block. The roots of the working IR forest are abstract 
registers, which correspond to the subject registers and 
other features of the subject architecture. When the last: 
subject instruction has been decoded, realized, and its IR 
tree integrated with the working IR forest, the IR forest 
for that block is complete. 

in Figure 12. the first subject instruction 310 is 
•lui al, 0x1234". The semantics of this instruction 310 
are to load the constant value 0x1234 into the. upper is 
bits of subject register >1- 342. This instruction 310 
realizes a MIPS_MIPS_LUI node. 344, with an immediate field 
constant value of 0x1234. The translator adds this node 
to the working IR forest by setting abstract register »al" 
342 (the destination register of the subject instruction) 
to point to the MIFS_MIPS_LVX • IR node 344 . 

in the same example in Figure 12, the second subject 
instruction 312 is "ori al, al. 0x567S" . The semantics of 
this instruction 312 are to perform a bitwise-or of the 
constant value 0x5678 with the current contents of subject 
register "al" 342 and to store the result in subject 
register "al" 346. This instruction 312 realizes a 
MIPS_MIPS_ORI node 34 8, with an immediate field constant 
value of 0X5S78. The translator adds this ^ode^.to ,. the 



02-MAY-2003 16 5 21 FROM HPPLEYfiRD LEES TO 01633814444 P. 43 



working. IR forest by first setting the ORI node to point 
to the IR tree that is currently pointed to by abstract 
register w al" 342 (the source register of the subject 
instruction) , and then setting the abstract register «al" 
5 346 (the destination register of the subject instruction) 
to point to the OKI node 348. In other words, the 
existing "al" tree rooted with abstract register 342 
(i.e., the LiUI node) becomes a subtree 350 of the ORI node 
348 , and then the ORI node 348 becomes the new a! tree. 
10 The old -al- tree (after LUI but before ORI) is rooted 
from abstract register" 342 and shown as linked by line 
345, while the current *al w tree (after ORI) is rooted 
from abstract register 34 6. 

15 As can be seen from the foregoing, an improved program 

code conversion apparatus formed in accordance with the 
present invention is ] configurable to any subject and 
target processor architecture pairing while maintaining an 
optimal level of performance 'and balancing the speed of. 

20 translation with the efficiency of the translated target 
code- Moreover, depending upon the particular 

architectures of the subject and target computing 
environments involved in the conversion, the program code 
conversion apparatus of the present invention can be 

25 designed with a - hybrid design of generic and specific 
conversion features by utilizing a combination of base 
nodes, complex nodes, polymorphic nodes, and architecture- 
specific nodes in its intermediate representation. 

30 The different structures of the improved program code 

conversion apparatus of the present invention are 
described separately in each of the above embodiments. 
However, it is the full intention of the inventors of the 
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preset invention that the separate aspects of each 
embodiment described herein may be coined with the other 
embodiments described herein. For instance, the 

translator formed in accordance with the present invents 
n^y comprise hybrid optimisations of various IR types. 
Thoee sKilled in the art will appreciate that various 
adaptations and modif ications of. the just -descrxbed 
preferred embodxment can be configured without departing 
from the scope and spirit of the invention. Therefore, xt 
±. to be understood that, within the scope of the appended 
claims, the invention may be practiced other than as 
specifically described herein. 

Although a few preferred embodiments have been shown 
and described, it will be appreciated by those skilled xn 
the art that various changes and modifications might be 
^de without departing from the scope of the invention, as 
defined in the appended claims. 

20 Attention is directed to all papers and documents 

which are filed concurrently with or previous to this 
specification in connection with this application and 
which are open to public inspection with thxs 
specification, and the contents of all such papers and 

25 documents are incorporated herein by reference. 

All of the features disclosed in this specification 
(including any accompanying claims, abstract and 
drawings), and/or all of the steps of any method or 
30 process so disclosed, may be combined in any combination, 
except combinations where at least some of such features 
and/ or steps are mutually exclusive. 
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Bach feature disclosed in this specification 
(including any accompanying claims, abstract and drawings) 
may be replaced by alternative features serving the same, 
equivalent or similar purpose, unless expressly stated 
5 otherwise. Thus, unless expressly stated otherwise, each 
feature disclosed is one example only of a generic series 
of equivalent or similar features. 

The invention is not restricted to the details of the 
io foregoing embodiment (s) . The invention extends to any 
novel one, or any novel combination, of the features 
disclosed in this specification (including any 
accompanying claims, abstract and drawings), or to any 
novel one, or any novel combination, of the steps of any 
15 method or process so disclosed. 
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Claims 



lm A method of generating an intermediate 

representation of program code, comprising the steps of: 

decoding instructions in the program code; 

generating an intermediate representation (IR) of the 
decoded program code to include at least one type of IR 
nodes out of a plurality of possible types of IR nodes; 
and 

determining which type of IR nodes to generate in the 
intermediate representation for each respective 
instruction in the decoded program code, wherein the IR 
nodes in the intermediate representation (IR) are abstract 
representations of the expressions, calculations, and 
operations performed by the program code. 

20 .2. The method of claim 1, wherein the plurality of 

possible types of IR nodes include base nodes and complex 
nodes . 

3. The method of claim 2, wherein base nodes 

25 represent the most basic semantics of any subject 
architecture running the program code, such that the 
semantics of base nodes cannot be decomposed into other 
nodes representing more simple semantics -. 

3 0 4. The method of claim 3, wherein base nodes are 

generic across a plurality of possible subject 
architectures . 
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" The method of claim 3 or 4 , wherein complex nodes 
provide a more compact representation of the semantics of 
complex instructions in the program code than that of base 
node representations, 

6 . The method of claim 5, wherein complex nodes 

represent immediate -type instructions in which a. constant 
operand value is encoded into the immediate -type 
instruction itself in an immediate field. 

7 . The method of claim 5 or 6 , wherein a complex node 
may be decomposed into a plurality of base nodes to 
represent the same semantics of an instruction in the 
decoded program code . , 

15 

8. The method of claim 5, 6 or 7 wherein the program 
code is designed to be executed by a subject architecture, 
the method further comprising the step of generating 
complex nodes only for those features correspondingly 

20 configurable on the subject architecture. 

9 . The method of claim 2 or any claim dependent 
thereon, wherein the plurality of possible types of IR 

■ * nodes further include polymorphic nodes , 

25 

10. The method of claim 9, wherein the program code is 
subject code designed for execution on a subject 
architecture and is dynamically translated into target 
code for execution on a target architecture, said method 

. -30 , further comprising : 

generating the intermediate representation to include 
^mm^. polymorphic nodes, wherein polymorphic nodes contain a 
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function pointer to a function of the target architects 
specific to a particular instruction in the subject code. 

H The method of claim 10, said method further 

comprising generating polymorphic nodes when .the features 
of the target architecture would cause the semantics of a 
particular subject instruction to be lost if realized as 
base nodes . 

13, The method of claim 10 or 11, wherein each 

polymorphic node is specific to a combination of a 
particular instruction in the subject code and a functxon 
of the target architecture. 

is 13. The method of claim 10, 11 or 12 wherein said 

determining the type of IR nodes step further comprises 
identifying an instruction in subject code which 
corresponds an instruction on a list of polymorphxc 
instructions to be realized as polymorphic nodes; and 

20 

wn6n a subject instruction corresponds to an 
instruction on the list of polymorphic instructions, said 
IR generating step generates polymorphic nodes only for 
those subject instructions corresponding to those on the 
25 list of polymorphic instructions. 

1 

14. The method of any preceding claim, wherein the 
plurality of possible types of IR nodes further include 
base nodes and architecture-specific nodes. 

15. The method of claim 14, wherein the program code 
is subject code designed for execution on a subject 
architecture and is dynamically translated into target 
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code for execution on a target architecture, said method 
further comprising: 

generating the intermediate representation to include 
architecture-specific nodes which are specific to a 
particular combination of a subject architecture ana a 
target architecture . 

16. The method of claim 15, the intermediate 

10 representation generating step further comprising: 

initially representing all of the instructions in the 
subject code as subject architecture-specific nodes, where 
each subject architecture -specific node corresponds to a 
15 respective instruction in the subject code; 

determining whether an instruction in the subject code 
is one in which to provide a target architecture- 
specialized conversion function, converting subject 
20 architecture-specific nodes into target architecture- 
specific nodes for those instructions determined, to 
provide a target architecture- specialized conversion 
function? and 

25 generating base nodes from the remaining subject 

architecture- specific nodes which are not identified as 
providing a target architecture-specialized code 
generation function. 



30 



17. The method of claim 16, further comprising 

generating corresponding target code from the target 
architecture-specific nodes which is specialized for the 
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1S . The method of claim 15, 16 or 17 further 

comprising generating corresponding target code from the 
base nodes which is not specialized for the target 
5 architecture. 

19. A computer- readable recording medium containing 

program code for performing the method of any preceding 
claim. 



10 



15 



20 



25 
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20. A computer-readable storage medium having 

translator software resident thereon in the form of 
computer readable code executable by a computer to perform 
the following steps during translation of subject program 
code to target program code: 

decoding instructions in the subject program code; 

generating an intermediate representation (IR) of the 
decoded subject program code to include at least one type 
of IR nodes out of a plurality of possible types of IR 
nodes ; 

determining which type of IR nodes to generate in the 
intermediate representation for each respective 
instruction in the decoded subject program code, wherein 
the IR nodes in the intermediate representation (IR) are 
' abstract representations of the expressions, calculations, 
and operations performed by the program code; and 

generating target program code using the intermediate 
representation (IR) - 



10 
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21. The computer- readable storage medium of claim 20. 

wherein th* plurality of possible types of IR nodes 
include base nodes and complex nodes. 

5 22. The computer -readable storage medium of claim 21. 

wherein base nodes represent the most baexc semantics of 
"any subject architecture running the program code, such 
that the semantics of base nodes cannot be decomposed into 
other nodes representing more simple semantics. 

23. The computer-readable storage medium of claim 22, 
wherein base nodes are generic across a plurality of 
possible subject architectures. 

24. The computer- readable storage medium of claim 22 
or 23. wherein complax nodes provide a more compact 
representation of the semantics of complex instructions in 
the program code than that of base node representations. 

20 25. The computer -readable storage medium of claim 24, 

wherein complex nodes represent immediate- type 
instructions in which a constant operand value is encoded 
into the immediate -type instruction itself in an immediate 
field. 

26 . The computer- readable storage medium of claim 24 
or 25, wherein a complex node may be decomposed into a 
plurality of base nodes to represent the same semantics of 
an instruction in the decoded program code. 

27. The computer- readable storage medium of claim 24, 
25 or 26 wherein the subject program code is designed to 
be executed by a subject architecture, the method further 
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coding tb. step at' ****** oc-H« «d- -IT *- 
those features correspondingly configurable on the subject 

architecture . 

5 28 The computer-readable storage medium of any of 

claims 21 to 27, wherein the plurality of possible types 
of IR nodes further include polymorphic nodes. 

2* The computer- readable storage medium of claim 28, 

10 wherein the sublet program code is designed for execution 
~ on a subject architecture and is dynamically translated 
into target code for execution on a target architecture, 
eaid translator software further containing computer 
readable code executable by a computer to perform the 
15 following steps; 

generating the intermediate representation to include 
polymorphic nodes, wherein polymorphic nodes contain a 
function pdinter to a function of the target architecture 
20 specific to a particular instruction in the subject code. 



25 



30 



30. The computer-readable storage medium of claim 29, 
said translator software further containing computer 
readable code executable by a computer to generate 
polymorphic nodes when the features of the target 

architecture would cause the semantics of a particular 
subject instruction to be lost if realized as base nodes. 

31. The computer-readable storage medium of claim 29 

' or 3 0, wherein each polymorphic node- is specific to a 
combination of a particular instruction in the subject 
code and a function of the target architecture. 
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32. The coitiputer- readable storage medium of claim 29, 

3 0 or 31 wherein said computer readable code executable by 
a 



computer for determining the type of IR nodes further: 



5 identifies an instruction in subject code which 

corresponds an instruction on a liet of polymorphic 
instructions to be realized as polymorphic nodes; and 

when a subject instruction corresponds to an 
10 instruction on the list of, polymorphic instructions, 
generates polymorphic nodes only for those subject 
instructions corresponding to those on the list of 
polymorphic instructions. 

15 33 . The computer -readable storage medium of any of 

claims 20 to 32, wherein the plurality of possible types 
of IR nodes further include base nodes and architecture- 
specific nodes. 

20 34. The computer-readable storage .medium of claim 33 , 

wherein the subject program code is designed for execution 
on a subject architecture and is dynamically translated 
into target code for execution on a target architecture, 
said translator software further containing computer 

25 readable code executable by a computer to perform the 
following steps; 

generating the intermediate representation to include 
architecture- specif ic nodes which are specific to a 
30 particular combination of a subject architecture and a 
target architecture. 
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3S The computer-readable storage medium of claim 34, 

said translator software further containing computer 
readable code executable by a computer to perform the 
following steps: 

initially representing all of the instructions in the 
subject code as subject, architecture^*^ ic nodes, where 
each subject architecture- specif ic node corresponds to a 
respective instruction in the subject code; 



10 



is 



determining whether an instruction in the subject code 
one in which to provide a target architecture- 
specialized conversion function, converting subject 
architecture-specific nodes into target architecture- 
15 specific nodes for those instructions determined to 
provide a target architecture -specialized conversion 
function; and 

generating base nodes from the remaining subject 
20 architecture-specific nodes which are not identified as 
providing a target architecture- specialized code 
generation function. 

36. The computer- readable storage medium of claim 35, 
said translator software further containing computer 
readable code executable by a computer to generate 
corresponding target code from the target architecture- 
specific nodes which is specialized for the target 
architecture . 

37. The computer-readable storage medium of claim 34, 
35 or 36 said translator software further containing 

■rmm^tmpaM readable code executable«*toy a computer to 
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generate corresponding target code from the base nodes 
which is not specialized for the target architecture, 

38. A translator apparatus for use in a target 

5 computing environment having a processor and a memory 
coupled to the processor for translating subject program 
code appropriate in a subject computing environment to 
produce target program code appropriate to the target 
computing environment, the translator apparatus 
1 0 compriS iiig : 

a decoding mechanism configured to decode instructions 
in the subject program code? 

15 an intermediate representation generating mechanism 

configured to generate an intermediate representation (IR) 
of the decoded program code to include at least one type 
of IR nodes out of a plurality of possible types of I& 
nodes ; and 

20 

an intermediate representation (IR) type determining 
mechanism configured to determine which type of IR nodes 
to generate in the intermediate representation for each 
respective instruction in the decoded program code, 
25 wherein the IR nodes in the intermediate representation 
(IR) are abstract representations of the expressions, 
calculations, and operations performed by the program 
code . 

30 39, The translator apparatus of claim 38, wherein the 

plurality of possible types of IR nodes include base nodes 
and complex nodes. 
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40 ^ translator apparatus of claim 39. 

archiC e=ture runnins th. pro 9 ram 
semantics of base nodes o^c be decomposed 
5 nodes reprinting mors simple ssmantics . 

of f-laicn 40, wherein base 
41 The translator apparatus of claim * 

= r,l„ralitv of possible sub 3 eCt 
nodes are generic across a plurality °i p 

architectures . 

" 42 The translator apparatus of claim 40 or 41. 

whsrein comple* nodes provide a - co^act 
representation of the semantics of comple* instructions m 
tha program coda than that of base node representations. 

Ih. translator apparatus «■ " herein 

complex nodes represent i™nediate-typ. instructions rn 
wMch . constant oparand value is encoded rnto the 
immediate-type instruction itself in an immediate £reld. 

44 The translator apparatus of claim 42 or 43, 
whsr.in a complex node n»y be decomposed into a plurality 

j V„ rsnresent the same semantics of an 
of base nodes to represent i-«<= 

instruction in the decoded program code. 

45 .The * translator apparatus of claim 42, 43 or 44 
wherein the program code is designed to be executed by a 
subject architecture, the intermediate representation 
generating mechanism further comprising a comply node 
generating mechanism for generating complex nodes only for 
those features correspondingly configurable an the sublet 
architecture . 
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46. The translator apparatus of any of claims 39 to 

45, wherein the plurality of possible types of IR nodes 
further include polymorphic nodes- 

s 47. The translator apparatus of claim 46, wherein the 

program code 1b subject code designed for execution on a 
subject architecture and is dynamically translated into 
target code for execution on a target architecture, the 
intermediate representation generating mechanism further 
10 comprising: 

a polymorphic node generating mechanism for generating 
the intermediate representation to include polymorphic 
nodes, wherein . polymorphic nodes contain a function 
15 pointer to a function of the target architecture specific 
to a particular instruction in the subject code, 

48. The translator apparatus of claim 47, said 
polymorphic node generating mechanism generating 

20 polymorphic nodes when the features of the target 
architecture would cause the semantics of a particular 
subject instruction to be lost if realized as base nodes. 

49. The- translator apparatus of claim 47 or 48, 
25 wherein each polymorphic node is specific to a combination 

of a particular instruction in the subject code and a 
function of the target architecture . 

50. The translator apparatus of claim 47, 48 or 49 
30 wherein said intermediate representation (IR) type 

determining mechanism further comprises a polymorphic 
identification mechanism for identifying an instruction in 
subject code which^corresponds an insteariae^on^on a list of 
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^ructions to be realized as polymorphic ^ 
polymorphic instructions 

nodes ; and 

wherl . subject .^ruction correspond* to an 

the list of polymorphic instructions, said 
instruction on the list v ^ Tioraf . e _ 

^.H^ generating mechanism generates 
intermediate representation general y ^ 

. . n onlv for those subject instructions 

polymorphic nodes only pQXymor phic 
corresponding to those on the lxst P 

instructions- 

B1 . The translator .pparatu. of any ot claims 38 to 

bo', wherein the plurality o£ possible types of » node, 
further include b a se nod., and archit.cture-specrf rc 
nodes - 

5 52 The translator apparatus o£ claim 51. wherein the 

program code is subjeot code designed for execution on a 
subiect architecture and is dynamically translated into 
target code tor execution on a target architecture, sard 

,„ intermediate representation generating mechanise further 
comprising: 

an architecture-specific node generating mechanism for 
generating the intermediate representation to include 
25 architecture-specific nodes which are specific to a 
particular combination of a subject architecture and a 
target architecture. 

53. The translator apparatus .of claim 52, the 

30 intermediate representation generating mechanism being 
configured to: 



! 0067912 02 -May- 03 04 :28 | 



02-MRY-2003 16^27 FROM RPPLEYflRD LEES 



TO 01633814444 



P. 59 



55 



10 



15 



25 



initially represent all of the instructions in the 
subject code as subject architecture-specific nodes, where 
each subject architecture -specific node corresponds to a 
respective instruction in the subject code; 

. determine whether an instruction in the subject code 
is one in which to provide a target architecture- 
specialized conversion function, convert subject 
architecture-specific nodes into target architecture- 
specific nodes fox those instructions determined to 
provide a target architecture-specialized conversion 
function; and 

generate base nodes from the remaining subject 
architecture-specific nodes which are not identified as 
providing a target. architecture-specialized code 
generation function. 

54t The translator apparatus of claim 53, further 

comprising a specialized target code generating mechanism 
for generating corresponding target code from the target 
architecture-specific nodes which is specialized for the 
target architecture. 

55. The translator apparatus of claim 52, 53 or 54, 

further comprising * non-specialized target code 
generating mechanism for generating corresponding target 
code from the base nodes which is not specialized for the 
target architecture. 
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ZKPROVSD AHCHXTECTORB! FOR GENERATHTQ IHTBRMEEIATF 
REPRESENTATIONS TOR FROSRAM C0DE «**" SS3IaN 

« improved architecture tor . program cede 
apparatus and -hhod for generating ™^ a " 

cod- conversion apparatus determines which types of I* 
nodea to generate in an intermediate representation of 
subject code to be translated. Depending upon the 
particular subject and target counting environment, 

involved in the conversion, the program code conversion 
apparatus utili.ee either h.ee nodes, complex nodes 

polymorpnic nodes, and «cnitecture-specific nodes or 

IT cognation thereof, in generating the internet. 

representation . 
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